Fully distributed actor-critic architecture for multitask deep reinforcement learning
نویسندگان
چکیده
We propose a fully distributed actor-critic architecture, named Diff-DAC, with application to multitask reinforcement learning (MRL). During the process, agents communicate their value and policy parameters neighbours, diffusing information across network of no need for central station. Each agent can only access data from its local task, but aims learn common that performs well whole set tasks. The architecture is scalable, since computational communication cost per depends on number neighbours rather than overall agents. derive Diff-DAC duality theory provide novel insights into framework, showing it actually an instance dual ascent method. prove almost sure convergence under general assumptions hold even deep-neural approximations. For more restrictive assumptions, we also this stationary point approximation original problem. Numerical results extensions continuous control benchmarks demonstrate stabilises has regularising effect induces higher performance better generalisation properties previous architectures.
منابع مشابه
Diff-DAC: Distributed Actor-Critic for Multitask Deep Reinforcement Learning
We propose a multiagent distributed actor-critic algorithm for multitask reinforcement learning (MRL), named Diff-DAC. The agents are connected, forming a (possibly sparse) network. Each agent is assigned a task and has access to data from this local task only. During the learning process, the agents are able to communicate some parameters to their neighbors. Since the agents incorporate their ...
متن کاملPretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations
Pretraining with expert demonstrations have been found useful in speeding up the training process of deep reinforcement learning algorithms since less online simulation data is required. Some people use supervised learning to speed up the process of feature learning, others pretrain the policies by imitating expert demonstrations. However, these methods are unstable and not suitable for actor-c...
متن کاملActor-Mimic: Deep Multitask and Transfer Reinforcement Learning
The ability to act in multiple environments and transfer previous knowledge to new situations can be considered a critical aspect of any intelligent agent. Towards this goal, we define a novel method of multitask and transfer learning that enables an autonomous agent to learn how to behave in multiple tasks simultaneously, and then generalize its knowledge to new domains. This method, termed “A...
متن کاملDynamic Control with Actor-Critic Reinforcement Learning
4 Actor-Critic Marble Control 4 4.1 R-code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.2 The critic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.3 Unstable actors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.4 Trading off stability against...
متن کامل1 Supervised Actor - Critic Reinforcement Learning
Editor’s Summary: Chapter ?? introduced policy gradients as a way to improve on stochastic search of the policy space when learning. This chapter presents supervised actor-critic reinforcement learning as another method for improving the effectiveness of learning. With this approach, a supervisor adds structure to a learning problem and supervised learning makes that structure part of an actor-...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Knowledge Engineering Review
سال: 2021
ISSN: ['0269-8889', '1469-8005']
DOI: https://doi.org/10.1017/s0269888921000023